Suggesting Sounds for Images from Video Collections

نویسندگان

Matthias Solèr

Jean Charles Bazin

Oliver Wang

Andreas Krause

Alexander Sorkine-Hornung

چکیده

Given a still image, humans can easily think of a sound associated with this image. For instance, people might associate the picture of a car with the sound of a car engine. In this paper we aim to retrieve sounds corresponding to a query image. To solve this challenging task, our approach exploits the correlation between the audio and visual modalities in video collections. A major difficulty is the high amount of uncorrelated audio in the videos, i.e., audio that does not correspond to the main image content, such as voice-over, background music, added sound effects, or sounds originating off-screen. We present an unsupervised, clustering-based solution that is able to automatically separate correlated sounds from uncorrelated ones. The core algorithm is based on a joint audio-visual feature space, in which we perform iterated mutual kNN clustering in order to effectively filter out uncorrelated sounds. To this end we also introduce a new dataset of correlated audio-visual data, on which we evaluate our approach and compare it to alternative solutions. Experiments show that our approach can successfully deal with a high amount of uncorrelated audio.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SIDF: A Novel Framework for Accurate Surgical Instrument Detection in Laparoscopic Video Frames

Background and Objectives: Identification of surgical instruments in laparoscopic video images has several biomedical applications. While several methods have been proposed for accurate detection of surgical instruments, the accuracy of these methods is still challenged high complexity of the laparoscopic video images. This paper introduces a Surgical Instrument Detection Framework (SIDF) for a...

متن کامل

Extending SAR Image Despckling methods for ViSAR Denoising

Synthetic Aperture Radar (SAR) is widely used in different weather conditions for various applications such as mapping, remote sensing, urban, civil and military monitoring. Recently, a new radar sensor called Video SAR (ViSAR) has been developed to capture sequential frames from moving objects for environmental monitoring applications. Same as SAR images, the major problem of ViSAR is the pres...

متن کامل

Multimedia Search Technologies

One of the most ubiquitous activities related to learning in the digital age is “search”. In recent years, computers have rapidly evolved from numeric and text processing to include multimedia, specifically audio, video, and images. However, few methods exist for searching multimedia, apart from textbased strategies operating on keywords, metadata and filenames. Creating text descriptions for m...

متن کامل

Fast Robust Large-scale Mapping from Video and Internet Photo Collections

This paper presents a system approaching fully automatic 3D modeling of large-scale environments. Our system takes as input either a video stream or collection of photographs obtained from Internet photo sharing web-sites such as Flickr. The system achieves high computational performance through algorithmic optimizations for efficient robust estimation, the use of imagebased recognition for eff...

متن کامل

Linking the sounds of dolphins to their locations and behavior using video and multichannel acoustic recordings.

It is difficult to attribute underwater animal sounds to the individuals producing them. This paper presents a system developed to solve this problem for dolphins by linking acoustic locations of the sounds of captive bottlenose dolphins with an overhead video image. A time-delay beamforming algorithm localized dolphin sounds obtained from an array of hydrophones dispersed around a lagoon. The ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Suggesting Sounds for Images from Video Collections

نویسندگان

چکیده

منابع مشابه

SIDF: A Novel Framework for Accurate Surgical Instrument Detection in Laparoscopic Video Frames

Extending SAR Image Despckling methods for ViSAR Denoising

Multimedia Search Technologies

Fast Robust Large-scale Mapping from Video and Internet Photo Collections

Linking the sounds of dolphins to their locations and behavior using video and multichannel acoustic recordings.

عنوان ژورنال:

اشتراک گذاری